Refinement Type Inference via Abstract Interpretation 

Ranjit Jhala Rupak Majumdar Andrey Rybalchenko 

UCSD UCLA TUM 

jhala@cs.ucsd.edu rupak@cs.ucla.edu rybal@in.tum.de 



Abstract 

Refinement Types are a promising approach for checking behav- 
ioral properties of programs written using advanced language fea- 
tures like higher-order functions, parametric polymorphism and re- 
cursive datatypes. The main limitation of refinement type systems 
to date is the requirement that the programmer provides the types 
of all functions, after which the type system can check the types 
and hence, verify the program. 

In this paper, we show how to automatically infer refinement 
types, using existing abstract interpretation tools for imperative 
programs. In particular, we demonstrate that the problem of refine- 
ment type inference can be reduced to that of computing invari- 
ants of simple, first-order imperative programs without recursive 
datatypes. As a result, our reduction shows that any of the wide 
variety of abstract interpretation techniques developed for impera- 
tive programs, such as polyhedra, counterexample guided predicate 
abstraction and refinement, or Craig interpolation, can be directly 
applied to verify behavioral properties of modem software in a fully 
automatic manner. 

1. Introduction 

Automatic verification of semantic properties of modern program- 
ming languages is an important step toward reliable software 
systems. For higher-order programming languages with inductive 
datatypes or polymorphic instantiation, the main verification tool 
has been type systems, which traditionally capture only coarse 
data-type properties (such as ints are only added to ints), and 
require the programmer to explicitly annotate program invariants if 
more precise invariants about program computations are required. 

For example, refinement type systems 1331 associate data types 
with refinement predicates that capture richer properties of program 
computation. Using refinement types, one can state, for instance, 
that a program variable xs has the refinement type "non-zero inte- 
ger," or that the integer division function has the refinement type 
int {p : int \ v ^ Q} ^ int which states that the second ar- 
gument must be non-zero. Then if a program with refinement type 
type-checks, one can assert that there is no division-by-zero error in 
the program. The idea of refinement types to express precise pro- 
gram invariants is well-known |31 |10|[T21|13II27II33I . However, in 
each of the above systems, the programmer must provide refine- 
ments for each program type, and the type system checks the pro- 
vided type refinements for consistency. We believe that this burden 
of annotations has limited the widespread adoption of refinement 
type systems. 

For imperative programming languages, algorithms based on 
abstract interpretation can be used to automatically infer many pro- 
gram invariants 1 2 6 1 6 1 , thereby proving many semantic properties 
of practical interest. However, these tools do not precisely model 
modern programming features such as closures and higher-order 
functions or inductive datatypes, and so in practice, they are too 
imprecise when applied to higher-order programs. 
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In this paper, we present an algorithm to automatically verify 
properties of higher-order programs through refinement type in- 
ference (RTI) by combining refinement type systems for higher- 
order programs with invariant synthesis techniques for first-order 
programs. Our main technical contribution is a translation from 
type constraints derived from a refinement type system for higher- 
order programs to a first-order imperative program with assertions, 
such that the assertions hold in the first-order program iff there is 
a refinement type that makes the higher-order program type-check. 
Moreover, a suitable type refinement for the higher-order program 
can be constructed from the invariants of the first-order program. 
Thus, our algorithm replaces the manual annotation burden for re- 
finement types with automatically constructed program invariants 
on the translated program, thus enabling fully automatic verifica- 
tion of programs written in modern languages. 
The RTI algorithm (Figure[T]l proceeds in three steps. 

Step 1: Type-Constraint Generation. First, it performs Hindley- 
Milner type inference 1 1 1 1 to construct ML types for the program, 
and uses these types to generate refinement templates, i.e., types in 
which refinement variables k. are used to represent the unknown 
refinement predicates. Then, the algorithm uses a standard syntax- 
directed procedure to generate subtyping constraints over the tem- 
plates such that the program type checks (i.e., is safe) if the subtyp- 
ing constraints are satisfiable I3l ll9ll29lf33l . 

Step 2: Translation. Second, it translates the set of type constraints 
to a first-order, imperative program over base values such that the 
type constraints are satisfiable if and only if the imperative program 
does not violate any assertions. 

Step 3: Abstract Interpretation. Finally, an abstract interpretation 
technique for first order imperative programs is used to prove that 
the first order program is safe. The proof of safety produced by 
this analysis automatically translates to solutions to the refinement 



type variables, thus generating refinement types for the original ML 
program. 

The main contribution of this paper is the RTI translation al- 
gorithm. The advantage of the translation is that it allows one 
to apply any of the well-developed semantic imperative program 
analyses based on abstract interpretation (e.g., polyhedra |9| and 
octagons |6|, counterexample-guided predicate abstraction refine- 
ment (CEGAR) 1 2 16], Craig interpolation fl6l|22l, constraint- 
based invariant generation |4,30| random interpretation (151, etc.) 
to the verification of modem software with polymorphism, induc- 
tive datatypes, and higher-order functions. Instead of painstakingly 
reworking each semantic analysis for imperative programs to the 
higher order setting, possibly re-implementing them in the process, 
one can use our translation, and apply any existing analysis as is. 
In fact, using the translation, our implementation directly uses a 
CEGAR and interpolation based safety verification tool to verify 
properties of OCAML programs. 

In essence, our algorithm separates syntactic reasoning about 
function calls and inductive data types (handled well by typing 
constraints) from semantic reasoning about data invariants (handled 
well by abstract domains). The translation from refinement type 
constraints to imperative programs in Step 2 is the key enabler. The 
translation, and the proof that the satisfiability of type constraints 
and safety of the translated program are equivalent, are based on 
the following observations. 

The first observation is that refinement type variables k define 
relations over the value being defined by the refinement type and 
the finitely many variables that are in-scope at the point where 
the type is defined. In the imperative program, each finite-arity 
relation can be encoded with a variable that encodes a relation. 
Each refinement type constraint can be encoded as a straight-line 
sequence that reads tuples from and writes tuples to the relation 
variables, and the set of constraints can be encoded as a non- 
terminating while-loop that in each iteration, non-deterministically 
executes one of the blocks. Thus, the problem of determining the 
existence of appropriate relations reduces to that of computing 
(overapproximations) of the set of tuples in each relation variable 
in the translated program (Theorem[Tl(. 

Our second observation is that if the translated program is in a 
special read-write-once form, where within each straight-line block 
a relation variable is read and written at most once, then one can 
replace all relation-valued variables with variables whose values 
range over tuples (Theorem O. Moreover, we prove that we can, 
without affecting satisfiability, preprocess the refinement typing 
constraints so that the translated program is a read-write-once pro- 
gram (Theorem [Sjl. Together, the observations yield a simple and 
direct translation from refinement type inference to simple impera- 
tive programs. 

We have instantiated our algorithm in a verification tool for 
OCAML programs. Our implementation generates refinement type 
constraints using the algorithm of |29|, and uses the ARMC |28l 
software model checker to verify the translated programs. This 
allows fully automatic verification of a set of OCAML bench- 
marks for which previous approaches either required manual an- 
notations (either the refinement types |33 | or their constituent 
predicates |29|), or an elaborate customization and adaptation of 
the counterexample-guided abstraction refinement paradigm |31|. 
Thus, we show, for the first time, how abstract interpretation can be 
lifted "as-is" to the practical refinement type inference for modern, 
higher-order languages. 

While we have focused on the verification of functional pro- 
grams, our approach is language independent, and requires only an 
appropriate refinement type system for the source language. 



let rec iteri i xs f = 
match xs with 
I □ -> 

I x: :xs' -> f i x; 

iteri (i+1) xs' f 

let mask a xs = 

let g j y = a.(j) <- y kk a.(j) in 

if Array. length a = List. length xs then 
iteri xs g 

Figure 2. ML Example 

2. Overview 

We begin with an example that illustrates how our refinement type 
inference (RTI) algorithm combines type constraints and abstract 
interpretation to automatically verify safety properties of functional 
ML programs with higher-order functions and recursive structures. 
We show that the combination of syntactic type constraints and 
semantic abstract interpretation enables the automatic verification 
of properties that are currently beyond the scope of either technique 
in isolation. 

An ML Example. Figure [2{ a) shows a simple ML program that 
updates an array a using the elements of the list xs. The program 
comprises two functions. The first is a higher-order list indexed- 
iterator, iteri, that takes as arguments a starting index i, a (poly- 
morphic) list xs, and an iteration function f . The iterator goes over 
the elements of the list and invokes f on each element and the in- 
dex corresponding to the element's position in the list. The second 
is a client, mask, of the iterator iteri that takes as input a boolean 
array a and a list of boolean values xs, and if the lengths match, 
calls the indexed iterator with an iteration function g that masks the 
j*** element of the array. 

Suppose that we wish to statically verify the safety of the array 
reads and writes in function g; that is to prove that whenever g is 
invoked, < j < len(a). As this example combines higher-order 
functions, recursion, data- structures, and arithmetic constraints on 
array indices, it is difficult to analyze automatically using either 
existing type systems or abstract interpretation implementations in 
isolation. The former do not precisely handle arithmetic on indices, 
and the latter do not precisely handle higher-order functions and are 
often imprecise on data structures. We show how our RTI technique 
can automatically prove the correctness of this program. 

Reflnement Types. To verify the program, we compute program 
invariants that are expressed as refinements of ML types with pred- 
icates over program values 1 3 , 1 9 , 29] . The predicates are additional 
constraints that must be satisfied by every value of the type. A 
base value, say of type int, can be described by the refinement 
type {v : int | p} where v \& a. special value variable representing 
the type being defined, and p is a refinement predicate which con- 
strains the range of to a subset of integers. For example, the type 
{v.liit I < < len(a)} denotes the set of integers c that are 
between and the value of the expression len(a). Thus, the un- 
refined type int abbreviates {v : int | true}, which does not con- 
strain the set of integers. Base types can be combined to construct 
dependent function types, written x : Ti -^T2, where Ti is the type 
of the domain, T2 the type of the range, and where the name x for 
the formal parameter can appear in the refinement predicates in T2. 
For example, the type 

int \ f >Q} ^ {i^: int | !^ = x + 1} 

is the type of a function which takes a non-negative integer param- 
eter and returns an output which is one more than the input. In the 
following, we write r for the type : t | true}. When u and r are 
clear from the context, we write {p} for {iz-.r \ p}. 



Safety Specification. Refinement types can be used to specify 
safety properties by encoding pre-conditions into primitive oper- 
ations of the language. For example, consider the array read a.(j) 
(resp. write a.(j) e) in g which is an abbreviation for get a j 
(resp. set a j e). By giving get and set the refinement types 

aiaarray — >■ {v. int \ < v < len(a)} a , 

aiaarray — >■ {!/:int | < < len(a)} — >■ a — >■ unit , 

we can specify that in any program the array accesses must be 
within bounds. More generally, arbitrary safety properties can be 
specified by giving assert the appropriate refinement type [291 . 
Safety Verification. The ML type system is too imprecise to prove 
the safety of the array accesses in our example as it infers that g 
has type j : int — > y :bool — >■ unit, i.e., that g can be called with 
any integer j. If the programmer manually provides the refine- 
ment types for all functions and polymorphic type instantiations, 
refinement-type checking 1 3 , 12 ,33 1 can be used to verify that the 
provided types were consistent and strong enough to prove safety. 
This is analogous to providing pre- and post-conditions and loop- 
invariants for verifying imperative programs. For our example, the 
refinement type system could check the program if the programmer 
provided the types: 

iteri :: i : int us : {u : a list | < len(!/)} — >■ 

{y.{l < V < len(xs)} — 5- a — > unit) — >■ unit 
g j : {0 < < len(a)} bool unit 

Here, we omitted refinement predicates that are equal to true, e.g., 
for i in the type of iteri. 

Automatic Verification via RTI. As even this simple example il- 
lustrates, the type annotation burden for verification is extremely 
high. Instead, we would like to verify the program without requir- 
ing the programmer to provide every refinement type. The RTI al- 
gorithm proceeds in three steps. First, we syntactically analyze the 
source program to generate subtyping constraints over refinement 
templates. Second, we translate the constraints into an equivalent 
simple imperative target program. Third, we semantically analyze 
the target program to determine whether it is safe, from which we 
conclude that the constraints are satisfiable and hence, the source 
program is safe. Next, we illustrate these steps using Figure |2] as 
the source program. 

2.1 Step 1: Constraint Generation 

In the first step, we generate a system of refinement type constraints 
for the source program I19II29I . To do so, we (a) build templates 
that refine the ML types with refinement variables that stand for 
the unknown refinements, and (b) make a syntax-directed pass over 
the program to generate subtyping constraints that capture the flow 
of values. For the functions iteri and g from Figure[2l with the 
respective ML types 

i : int — > xs : a list — >■ (j : int — 5- a — > unit) — >■ unit 
j : int — 5- bool — 5- unit 

we would generate the respective templates 

i : int — 5- xs : {0 < len(!/)} — S- (j : {ki} — 5- a — > unit) — >■ unit 
j : {^^2} bool — )■ unit 

Notice that these templates simply refine the ML types with refine- 
ment variables ki, K2 that stand for the unknown refinements. For 
clarity of exposition, we have added the refinement true for some 
variables (e.g., for the type a and bool); our system would auto- 
matically infer the unknown refinements. We model the length of 
lists (resp. arrays) with an uninterpreted function len from the lists 
(resp. arrays) to integers, and (again, for brevity) add the refinement 
stating xs has a non-negative length in the type of iteri. 



After creating the templates, we make a syntax-directed pass 
over the program to generate constraints that capture relationships 
between refinement variables. There are two kinds of type con- 
straints ~ well-formedness and subtyping. 

Well-formedness Constraints capture scoping rules, and ensure 
that the refinement predicate for a type can only refer to variables 
that are in scope. Our example has two constraints: 

i : int; xs : a list h {v. int | (wl) 
a: bool array; xs: a list h {!/:int | K2} (w2) 

The first constraint states that n\, which represents the unknown 
refinement for the first parameter passsed to the higher-order iter- 
ator iteri, can only refer to the two program variables that are 
in-scope at that point, namely i and xs. Similarly, the second con- 
straint states that ^2, which refines the first argument of g, can only 
refer to a and xs, which are in scope where g is defined. 

Subtyping Constraints reduce the flow of values within the 
program into subtyping relationships that must hold between the 
source and target of the flow. Each constraint is of the form 

Gl-Ti <: T2 

where G is an environment comprising a sequence of type bindings, 
and Ti and T2 are refinement templates. The constraint intuitively 
states that under the environment G, the type Ti must be a subtype 
of T2 . The subtyping constraints are generated syntactically from 
the code. First consider the function iteri. The call to f generates 

Gh{v^l} <: {ki} (cl) 

where the environment G comprises the bindings 

G = i : {true}; xs : {0 < 

x: {trite}; xs' : {0 < len(!/) = len(xs) — 1} 

the constraint ensures that at the callsite, the type of the actual is a 
subtype of the formal. The bindings in the environment are simply 
the refinement templates for the variables in scope at the point the 
value flow occurs. The type system yields the information that the 
length of xs' is one less than xs as the former is the tail of the 
latter I18II33I . Similarly, the recursive call to iteri generates 

G h {j : Ki : — >a unit} <: 

{(j : Q unit)[i + 1/i] [xs'/xs]} 

which states that type of the actual f is a subtype of the third 
formal parameter of iteri after applying substitutions [i + 1/i] 
and [xs'/xs] that capture the passing in of the actuals i + 1 
and xs' for the first two parameters respectively. By pushing the 
substitutions inside and applying the standard rules for function 
subtyping, this constraint simplifies to 

G h {«:i[i/i + l][xs/xs']} <: {ki} (c2) 

Next, consider the function mask. The array accesses inside g 
generate the "bounds-check" constraint 

G';y.{ti2};i.{true} h {z. = j } <: {0 < < len(a)} (c3) 

where G' = a: bool array; xs : {0 < len(!/)} has bindings for 
the other variables in scope. Finally, the flow due to the third 
parameter for the call to iteri yields 

G'; len(a) = len(xs) h {j : ^2 ^ r} <: {( j : ki ^ -r)[0/i]} 



where for brevity we write r for bool unit, and omit the trivial 
substitution [xs /xs] due to the second parameter. The last conjunct 
in the environment captures the guard from the if under whose 
auspices the call occurs. By pushing the substitutions inside and 
applying standard function subtyping, the above reduces to 

G'; len(a) = len(xs) h [0/i]} <: {«:2} (c4) 

For brevity we omit trivial constraints like ■ h int <: int. If the 
set of constraints constructed above is satisfiable, then there is a 
valid refinement typing of the program 1291 , and hence the program 
is safe. 

2.2 Step 2: Translation to Imperative Program 

Determining the satisfiability of the constraints requires semantic 
analysis about program computations. In the second step, our key 
technical contribution, we show a translation that reduces the con- 
straint satisfiability problem to checking the safety of a simple, im- 
perative program. Our translation is based on two observations. 

Refinements are Relations. The first observation is that type re- 
finements are defined through relations: the set of values denoted 
by a refinement type {u:t \ p} where p refers to the program vari- 
ables xi , . . . , x„ of the respective base types ri , . . . , r„ is equiva- 
lent to the set 

{to I 3(ti, . . . ,t„) s.t. {to,ti, ...,t„)eRpA } 

tl = X\ A . . .tn — Xn 

where Rp is an (n + l)-ary relation in r x ri x . . . x t,i defined 
by p. For example, the set of values denoted by {u : int | < i} 
is equivalent to the set: 

{to 1 3ti s.t. (to, ti) G -R< A ti = i} , 

where R< is the standard <-ordering relation over the integers. In 
other words, each refinement variable k can be seen as the projec- 
tion on the first co-ordinate of a (n + l)-relation over the variables 
(i/jXi, . . . , Xn), where xi, . . . ,Xn are the variables in the well- 
formedness constraint for k (i.e., the variables in scope of k). Thus, 
the problem of determining the satisfiability of the constraints is 
analogous to the problem of determining the existence of appropri- 
ate relations. 

Relations are Records. The second observation is that the problem 
of finding appropriate relations can be reduced to the problem 
of analyzing a simple imperative program with variables ranging 
over relations. In the imperative program, each refinement variable, 
standing for an n-^Iy relation, is translated into a record variable 
with n-fields. Each subtyping constraint can be translated into a 
block of reads-from and writes-to the corresponding records. The 
set of all tuples that can be written into a given record on some 
execution of the program defines the corresponding relation. The 
entire program is an infinite loop, which in each iteration non- 
deterministically chooses a block of reads and writes defined by 
a constraint. 

The arity of a relation, and hence the number of fields of the 
corresponding record, is determined by the well-formedness con- 
straints. For example, the constraint Jwlb specifies that ki corre- 
sponds to a ternary relation, that is, a set of triples where the O"* 
element (corresponding to i/) is an integer, the l*** element (corre- 
sponding to i) is an integer, and the 2"'* element (corresponding to 
xs) is a list. We encode this in the imperative program via a record 
variable ki with three fields ki.O, ki.1 and ki.2. 

Figure[3]shows the imperative program translated from the con- 
straints for our running example. We use the subtyping constraints 
to define the flow of tuples into records. For example, consider the 
constraint (c2\ which is translated to the block marked 1*0.2*1 . 
Each variable in the type environment is translated to a correspond- 



ing variable in the program. The block has a sequence of assign- 
ments that define the environment variables. For example, we know 
i has type int, so there is an assignment of an arbitrary integer 
to i. When there is a known refinement in the binding, the non- 
deterministic assignment is followed by an assume operation (a 
conditional) that establishes that the value assigned satisfied the 
given refinement. For example xs gets assigned an arbitrary value, 
but then the assume establishes the fact that the length of xs is non- 
negative. Similarly xs' gets assigned an arbitrary value, that has 
non-negative length and whose length is I less than that of xs. The 
LHS of ic2l ( reads a tuple from ki whose first and second fields are 
assumed to equal the i + 1 and xs' respectively. Finally, the triple 
(y, i, xs) is written into the record k\ which is the RHS of l [c2l l. 

Next, consider the translated block for the bounds-check con- 
straint ic3] >. Here, the translation is as before but the RHS is a 
known refinement predicate (that stipulates the integer be within 
bounds). In this case, instead of writing into the record that defines 
the RHS, the translation contains an assertion over the correspond- 
ing variables that ensures that the refinement predicate holds. 

Relational vs. Imperative Semantics. There is a direct correspon- 
dence between the refinement-relations and the record variables 
when the translated program is interpreted under a Relational se- 
mantics, where (I) the records range over (initially empty) sets of 
tuples, (2) each write adds a new tuple to the record's set, and, 
(3) each read non-deterministically selects some tuple from the 
record's set. Under these semantics, we can show that the con- 
straints are satisfiable iff the imperative program is safe (i.e., no 
assert fails on any execution) (Theorem[T}. 

Unfortunately, these semantics preclude the direct application 
of mature invariant generation and safety verification techniques 
e.g., those based on abstract interpretation or CEGAR-based soft- 
ware model checking, as those techniques do not deal well with 
set-valued variables. We would like to have an imperative seman- 
tics where each record contains a single value, the last tuple written 
to it. We show that there is a syntactic subclass of programs for 
which the two semantics coincide. That is, a program in the sub- 
class is safe under the imperative semantics if and only if it is safe 
under the set-based semantics (TheoremO. Furthermore, we show 
a technique that ensures that the translated program belongs to the 
subclass (Theorem^. 

The attractiveness of the translation is that the resulting pro- 
grams fall in a particularly pleasant subclass of programs which do 
not have any advanced language features like higher-order func- 
tions, polymorphism, and recursive data structures, or variables 
over complex types such as sets, that are the bane of semantic anal- 
yses. Thus, the translation yields simple imperative programs to 
which a wide variety of semantic analyses directly apply. 

2.3 Step 3: Invariant Generation. 

Together these results imply that we can run off-the-shelf abstract 
interpretation and invariant generation tools on the translated pro- 
gram, and use the result of the analysis to determine whether the 
original ML program is typable. 

For the translated program shown in Figure [5] the CEGAR- 
based software model checker ARMC 1 28 1 finds that the assertion 
is never violated, and computes the invariants: 

K\.\ < Ki.O A Ki.O < len(«:i.2) 
< K2.0 < len(«:2.1) 

which, when plugging in i/, i and xs for the 0"*, 1*', 2""^ fields 
of Ki and u, a for the 0*'', 1°' fields of K2 respectively, yields the 
refinements 

Ki = i < < len(xs) K2 = < u < len(a) 



loop{ /*cl*/ 

i •<— nondet{); 

xs <!— nondet(); assiime(0 < len(xs)); 

xs' <— nondet(); assume(0 < leii(xs') = leii(xs) — 1); 

V <— nondet(); assume(i/ = i); 

K\ (v^ 1, xs) 

1 /*c2*/ 

i <— nondet(); 

xs nondet(); assume(0 < len(xs)); 

xs' nondet(); assume(0 < len(xs') = len(xs) — 1); 

assume(ti = i + 1); 
assuiiie(t2 = xs'); 

<- to; 
Ki <— {!^, i, xs) 

1 /*c3*/ 

a <— nondet(); 

xs nondet(); assume{0 < len(xs)); 

(io, ill i2) ^ K2; 

j <- *o; 

assert (0 < i < len(a)) 

1 /*c4*/ 

a <— nondet(); 

xs •(— nondet(); assume(0 < len(xs)); 

assume (len(a) = len(xs)); 

(to, ti> i2) Ki; 

assuine(ti = 0); 

assuine(t2 = xs); 

1/ to; 

K2 <— (u, a, xs) 

} 

Figure 3. Translated Program 

which suffice to typecheck the original ML. Indeed, these predi- 
cates for Ki and K2 are easily shown to satisfy the constraints (cl), 
(c2), (c3), and (c4). 

3. Constraints 

We start by formalizing constraints over types refined with predi- 
cates. To this end, we make precise the notions of refinement predi- 
cates (Section[3TT), refinement types (Section [T2] l. constraints over 
refinement types and the notion of satisfaction (Section|33J. 

A discussion of how such constraints can be generated in a 
syntax-guided manner from program source is outside the scope 
of this paper; we refer the reader to the large body of prior research 
that addresses this issue [3 19,29 33 1. 

Notation. We use uppercase (Z) to denote sets, lowercase z to 
denote elements, and (Z) for a sequence of elements in Z. 

3.1 Refinement Logic 

Figure |4] shows the syntax of refinement predicates. In our discus- 
sion, we restrict the predicate language to the typed quantifier-free 
logic of linear integer arithmetic and uninterpreted functions. How- 
ever, it is straightforward to extend the logic to include other do- 
mains equipped with effective decision procedures and abstract in- 
terpreters. 

Types and Environments. Our logic is equipped with a fixed set of 
types denoted r, comprising the basic types int for integer values, 
bool for boolean values, and ui, a family of uninterpreted types 
that are used to encode complex source language types such as 
products, sums, polymorphic type variables, recursive types etc.. 
We assume there is a fixed set of uninterpreted functions. Each 
uninterpreted function f has a fixed type Tf = (r^ ) — !> Tf. An 
environment is a sequence of variable-type bindings. 



Expressions and Predicates. In our logic, expressions e comprise 
variables, linear arithmetic (i.e., addition and multiplication by con- 
stants), and applications of uninterpreted functions f . Note that as is 
standard in semantic program analyses, complex operations like di- 
vision or non-linear multiplication be modelled using uninterpreted 
functions. Finally, predicates comprise atomic comparisons of ex- 
pressions, or boolean combinations of sub-predicates. We write 
true (resp. false) as abbreviations for = (resp. = 1). 
Well-formedness. We say that a predicate p is well-formed in an 
environment P if every variable appearing in p is bound in P and p 
is "type correct" in the environment P. 

Validity. For each type r, we write U{t) to denote the set of 
concrete values of r. An interpretation a is a map from variables x 
to concrete values, and functions f to maps from W( (rf ) ) to W (rf° ) . 
We say that a is valid under P if for each x:t G P, we have 
o{x) £ U{t). We say that a predicate p is valid in an environment 
P, if evaluates to true for every a valid under P. 

3.2 Refinement Types 

Figure |4] shows the syntax of refinement types and environments. 
Refinements. A refinement r is either a predicate p drawn from 
our logic, or a refinement variable with pending substitutions 
. . . [yn/xn]. Intuitively, the former represent known re- 
finements (or invariants), while the latter represent the unknown in- 
variants that hold of different program values. The notion of pend- 
ing substitutions 1 1 , 19 1 offers a flexible way of capturing the value 
flow that arises in the context of function parameter passing (in the 
functional setting), or assignment (in the imperative setting), even 
when the underlying invariants are unknown. 
Refinement Types and Environments. A refinement type 
{v.T I r} is a triple consisting of a value variable v denoting the 
value being described by the refinement type, a type r describing 
the underlying type of the value, and a refinement r. A refinement 
environment G is a sequence of refinement type bindings. 

The value variables are special variables distinct from the 
program variables, and can occur inside the refinement pred- 
icates. Thus, intuitively, the refinement type describes the set 
of concrete values of the underlying type r which addition- 
ally satisfy the refinement predicate. For example, the refinement 
type: {y : int | 7^ 0} describes the set of non-zero integers and, 
{u-.int \ V = x + y} describes the set of integers whose value 
equals the sum of the values of the (program) variables x and y. 

Note that path-sensitive branch information can be captured 
by adding suitable bindings to the refinement environment. For 
example, the fact that some expression is only evaluated under the 
if-condition that x > 100 can be captured in the environment via a 
refinement type binding xt'.iv: bool | x > 100}. 

3.3 Refinement Constraints and Solutions 

Figure|4]shows the syntax of refinement constraints. Our refinement 
type system has two kinds of constraints. 
Subtyping Constraints are of the form 

G h {u-.T \ n} <: {v.T \ r2} 

Intuitively, a subtyping constraint states that when the program 
variables satisfy the invariants described in G, the set of values 
described by the refinement ri must be subsumed by the set of 
values described by the refinement type r2 . 
Well-formedness Constraints are of the form P h {i/ : r | r}. In- 
tuitively, a well-formedness constraints states that the refinement r 
must be a well-typed predicate in the environment G extended with 
the binding f : r for the value variable. 

Embedding. To formalize the notions of constraint validity and sat- 
isfaction, we embed subtyping constraints into our logic. We define 



the function Emb(-) that maps refinement types, environments and 
subtyping constraints to predicates in our logic. 

Emb({i/:r | p}) = p 

Emh{x:T;G) = Emb(T)[i^/a;] A Emb(G') 

Emb(0) = true 

Emb(G h Ti <: Ta) = Emb(G) ^ Emb(Ti) => Emb(r2) 

Similarly, we define the function Shape(-) that maps refinement 
types and environments to types and environments in our logic. 

Shape({i^:r | p}) = r 

Shape(x:r;G) = a; : Shape(T);Shape(G) 
Shape(0) = 

Validity. A subtyping constraint G h Ti <: T2 that does 
not contain refinement variables is valid if the predicate 
Emb(G h Ti <: T2) is valid under environment Shape(G). A 
well-formedness constraint F h {u-.r \ p} that does not contain 
refinement variables is valid if the predicate p is well-formed in the 
environment F. 

Relational Interpretations. We assume, without loss of generality, 
that each refinement variable k is associated with a unique well- 
formedness constraint xi:ti; . . . ; x„:Tn h {v:to\k} called 
the well-formedness constraint for k. In this case, we say k has 
arity n + 1. Furthermore, we assume that wherever a k of arity 
n + 1 appears in a subtyping constraint, it appears with a sequence 
of n pending substitutions [yi/xi] . . . [y„/xn]. This assumption 
is without loss of generality, as we can enforce it with trivial 
substitutions of the form [xi/xi]. A relational interpretation for k 
of arity n + 1, is an (n+ l)-ary relation in W(to) x . . . x Uijn). A 
relational model is a map from refinement variables k to relational 
interpretations. 

Constraint Satisfaction. A set of constraints C is satisfiable if 
for all interpretations for uninterpreted functions f , there exists a 
relational model S such that, when each occurrence of a refinement 
type{i^:r | K[j/i/a::i] . . . in G is substituted with 



3ti, 



,t„.S{K){iy,ti, . . . ,t„) Ati = yi A . . .t„ = y„)} 



every subtyping constraint after the substitution is valid. In this 
case, we say that 5* is a solution for G. 

4. Imperative Programs 

RTI translates the satisfiability problem for refinement type con- 
straints to the question of checking the safety of an imperative pro- 
gram in a simple imperative language IMP. In this section, we for- 
malize the syntax of IMP programs and define the Relational se- 
mantics and the Imperative semantics. 

4.1 Syntax 

Figure [5] shows the syntax of IMP programs. An instruction (I) 
is a sequence of assignments, assumptions and assertions. A pro- 
gram (P) is an infinite loop over a block, whose body is a 
non-deterministic choice between a finite number of instructions 
Ii, . . . , I„. Next, we describe the different kinds of instructions. 
For ease of notation, we assume that there is only one base type r, 
and let V denote the set of values of type r. 

Variables. Imp programs have two kinds of variables. (1) base vari- 
ables, denoted by u, x, y and t (and subscripted versions thereof), 
which range over values of type r. (2) relation variables, denoted 
by K, each of which have a fixed arity n and range over tuples of 
values or sets of n-tuples of values depending on the semantics. 

Base Assignments. Imp programs have two kinds of assignments 
to base variables. Either (1) an expression over base variables 
(cf. Figure is evaluated and assigned to the base variable, or. 
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Types: 

base type of integers 
base type of booleans 
complex uninterpreted type 

Environments: 

binding 
empty 

Expressions: 

variable 

integer 

addition 

affine multiplication 
function application 

Predicates: 

comparison 
negation 
conjunction 
implication 

Refinements: 

predicate 

ref. var. with substitutions 
Refinement Types 

Refinement Environments: 

binding 
empty 

Subtype Constraints 
WF Constraints 



Figure 4. Predicates, Refinements and Constraints. 



x •<— e 

X <— nondet() 

(to, . . . ,in) <— K 
K ^ [XQ, . . . , X„) 

assume (p) 
assert (p) 
Ii;l2 

loop{lil...ll,J 



Instructions: 

assign expr 

havoc 

get tuple 

set tuple 

assume 

assert 

sequence 

Program 



Figure 5. Imperative Programs: Syntax 



(2) an arbitrary value of the appropriate base type is assigned 
to the base variable, i.e., the variable is "havoc-ed" with a non- 
deterministically chosen value. 

Tuple Assignments. The operations get tuple and set tuple respec- 
tively read a tuple from and write a tuple to a relation variable. 

Assumes and Asserts. Imp programs have the standard assume 
and assert instructions using predicates over the base variables (cf. 
Figure©. We write skip as an abbreviation for assume(0 — 0). 

4.2 Relational Semantics 

We define the Relational semantics as a state transition system. In 
this semantics, k variables range over sets of tuples over V. 

Relational States. A state s" in the Relational semantics is either 
the special error state £ or a map from program variables to values 
such that every base variable is mapped to a value in V, and every 
relation variable of arity n is mapped to a (possibly empty) set of 
tuples in V" . Let S" be the set of all Relational-program states. 



For a state s" which is not £, variable x and value v we write 
s" [x I— >■ v\ for the map which maps xlov and every other key x' to 
s^{x'). We lift maps s" from base variables to values to maps from 
expressions (and predicates) to values in in the natural way. 

Initial State. The initial state sj, of an Imp program in the Rela- 
tional semantics is a map in which every base variable is mapped 
to a fixed value from V, and every relation variable is mapped to 
the empty set. 

Transition Relation. The transition relation is defined through a 
Post' operator, shown in Figure |6l which maps a state s" and an 
instruction I to the set of states that the program can be in after 
executing the instruction from the state s*. We lift Post" to a set of 
states E" C E' in the natural way: 

Post''(Ef,l) = y{Post''(s",l) I s" G Stt} 

Notice that the program halts if a get instruction is executed with 
an empty relation variable, or an assume(p) is executed in a state 
that does not satisfy p. 

Safety. Let P be the program loop{li [ . . . The set of 

Relational-reachable states of P, denoted Reach" (P) is defined by 
induction as: 

Reach" (P,0) = {sf,} 

Reach"(P,m + 1) = (j {Post"(Reach"(P, m), Ij) j 1 < j < n} 
Reach"(P) = IJ {Reach"(P, m) | < m} 

A program P is Relational- safe if £ ^ Reach" (P). 
4.3 Imperative Semantics 

Next, we define the Imperative semantics, as a state transition 
system. In this semantics, k variables k range over tuples over V. 

Imperative States. In the Imperative semantics, each state s is 
either the special error state £^ or a map from program variables 
to values such that every base variable is mapped to a value in V, 
and every relation variable of arity n is mapped either to a tuple in 
V^" or to the special undefined value ±. Let E denote the set of all 
a Imperative-program states. 

Initial State. The initial state so of an Imp program in the Impera- 
tive semantics is a map in which every base variable is mapped to 
a fixed value from V, and every relation variable is mapped to ±. 

Transition Relation. The transition relation is defined using a Post 
operator, which is identical to Post" in the Relational semantics 
except for the tuple-get and tuple-set instructions. Figure |6] shows 
the operator Post for get and set operations. Again, Post is lifted 
to a set of states in the natural way. Notice that the program halts if 
a get instruction is executed with an undefined relation variable, or 
an assume (p) is executed in a state that does not satisfy p. 

Safety. Let P be the program loop{li| . . . The set of 

Imperative-reachable states of P, denoted Reach (P) is defined by 
induction as: 

Reach(P,0) = {so} 

Reach(P,m+l) = (j {Post(Reach(P, m), Ij) | 1 < j < n} 
Reach(P) = IJ {Reach(P, m) | < m} 

A program P is Imperative-safe if £ ^ Reach (P). 

5. From Type Constraints to Imp Programs 

In this section we formalize the translation from type constraints 
into Imp programs and prove that the constraints are satisfiable if 
and only if the translated program is safe. 



Refinement Type Translation 

l{iy-r \ p}jget = nondetO; 

assume(p) 

[{i^:t I p}]set = assert(p) 

{{ly-.r \ K[yi . . .yn/xi . . .x„]}jget = {to, . . . ,t„) ^ k; 

assume(i/i = ti); 

assume(i/„ — 
f to 

[{U-.T \ n[yi . . .yn/xi . . .X„]}]set = K ^ (i^, yi ,...,?/„ ) 

Binding Translation 

ix:T;Gj = [rlset; x^i.; [G] 

H = skip 

Constraint Translation 

lGhn<:T4 - [Gj; m^^t; mset 

Constraint Set Translation 

[{ci,...,c„}] = loop{[ci]l...l[c„]} 

Figure 7. Translating Constraints to Imp Programs 
5.1 Translation 

Figure [7] formalizes the translation from (a set of) refinement type 
constraints C to an Imp program [C] . We use the WF constraints to 
translate each relation variable k of arity n+1 into a corresponding 
tuple variable k of arity n+1. 

The translation is syntax-driven. We translate each subtyping 
constraint G h T\ <: T2 into a straight-line block of instructions 
with three parts: a sequence of instructions that establishes the 
environment bindings ([G]), a sequence of instructions that "gets" 
the values corresponding to the LHS ([Tijjet) and a sequence 
of instructions that "sets" the (LHS) values into the appropriate 
RHS ([T2]set). The translation for a set of constraints is an infinite 
loop that non-deterministically chooses among the blocks for each 
constraint. 

Each environment binding gets translated as a "get". Bindings 
with unknown refinements are translated into tuple-get operations, 
followed by assume statements that establish the equalities corre- 
sponding to the pending substitutions. Bindings with known refine- 
ments are translated into non-deterministic assignments followed 
by a assume that enforces that the refinement holds on the non- 
deterministic value. 

Each "set" operation to an unknown refinement is translated 
into a tuple-set instruction that writes the tuple corresponding to 
the pending substitutions into the translated tuple variable. Finally, 
each "set" operation corresponding to a known refinement is trans- 
lated to an assert instruction; intuitively, in such constraints the 
RHS defines an upper bound on the set of values populating the 
type, and the assert serves to enforce the upper bound require- 
ment in the translated program. 

The correctness of the procedure is stated by the following 
theorem. 

Theorem 1 . C is satisfiable iff |GJ is Relational-safe. 

The proof of this theorem follows from the properties of the 
following function a that maps a set E' C E" of Relational-states 



Common Operations 

Postf(£,I) 
Post''{sH, Ii; I2) 
Post" (s", a; 4- e) 
Post''(s'',a; nondetO) 

Post''(s'', assume (p)) 
Post''(s'', assert (p)) 



Postl'(Post*(s'',Ii),l2) 

{s^[x^ s^(e)]} 

i-s> c] |cgy} 
'{s"} if s^{p) = true 

otherwise 
'{s"} if s^{p) = true 
{€} otherwise 



I\iple Operations: Relational Semantics 

Post''(sH, (to, . . . ,t„) ^k) = {stl[to ^ do] •■•[*« ^fn] I (i)o,.--,-yn) £ 
Post''{stt,K^ {a'0,...,x„)) = {stt[KH^stt(K)U{{stt{a;o),...,s''{a;„))}]} 



Triple Operations: Imperative Semantics 

{s[io H-i. Do] . . . [t„ H-i. ?;„]} if = (do, . . . , f„) 

if = ± 

Post(s, K <— (xo, . . . ,a;„)) = {s[k i-> (s(xo), . . . , s(x„))]} 



PoSt(s, (to, . . . ,tn) k) 



Figure 6. Relational and Imperative Semantics: Other cases of Post identical to Post" 



to constraint solutions: 

a(Et!) -- 



\K.{j{s\K) I s" G E»} 



The function a enjoys the following property, which can be proven 
by induction on the construction of Reach', that relates the satisfy- 
ing solutions of the constraints to the Relational-reachable states of 
the translated program. Theorem [T] follows from the following ob- 
servations. If 5" satisfies C then a(Reach''([C]))(K) C S{k) for 
all K. If f Reach"([C]) then a(Reach''([C])) satisfies C. 

5.2 Read- Write-Once Programs 

At this point, via Theorem [T] we have reduced checking satisfia- 
bility of type constraints to the problem of verifying assertions of 
Imp programs under the (non-standard) Relational semantics. Un- 
fortunately, under these semantics, the program contains variables 
(k) which range over sets of tuples. This makes it inconvenient to 
directly apply abstract-interpretation based techniques for imper- 
ative programs which typically assume the (standard) Imperative 
semantics; each technique has to be painstakingly adapted to the 
non-standard semantics. 

We would be home and dry if we could prove the equivalence 
of the Relational and Imperative semantics; that is, if we could 
show that an IMP program was Relational-safe if and only if it was 
Imperative safe. Unfortunately, this is not true. 

Example. Consider the IMP program: 



loop{ 



V ■ 

Hi ■ 



1 



(io) 
(to) 



to; X ■ 
■S— ft; 



} 



nondet(); 

assert (x — y) 

This program is not Relational-safe as the set-operation in the first 
instruction populates k with the set of all integers, and the get- 
operation in the second instruction can assign different values to 
integer values to x and y. However the program is Imperative-safe 
as whenever the second instruction executes, k will be undefined 
or contain some arbitrary integer that is assigned to both x and y, 
which causes the assert to succeed. 

This example pinpoints exactly why the two semantics differ. In 
the Relational semantics, in any given loop iteration, different gets 
on the same k can return different tuples, while in the Imperative 
semantics the gets are correlated and return the same tuple. 



Read- Write-Once Programs. An Imp instruction is a read-write- 
once instruction if any relation variable k is read from and written 
to at most once in the instruction. That is, read-write-once means 
at most one write and at most one read (and not at most one read 
or write). An IMP program is a read-write-once program if each 
instruction in its loop is a read-write-once instruction. We can 
show that for Read- Write-Once IMP programs the Relational and 
Imperative semantics are equivalent. 

Theorem 2. If P is a read- write-once IMP program then P is 
Relational-safe iffP is Imperative-safe. 

To prove this theorem, we formalize the connection between the 
reachable states under the two different semantics, using the func- 
tion Expand, which maps a Relational-state to a set of Imperative 
states: 

s{x) = s\x) 
I s(k) = (v) 
' s(k) = ± 



Expand(s 



s = £ 



for base variables 
if (v) G s\k) 
if s' {k) = 
if s" = £• 



We lift the function to sets of Relational states in the natural way: 

Expand(Stt) = [J {Expand(s'') | s" G E"} 

Next, we can show that read-write-once instructions enjoy the fol- 
lowing property, by case splitting on the form of /. 

Lemma 1. [Step] If I is a read-write-once instruction then 
Expand(Post''(s'',l)) = Post(Expand(s''), I). 

We use this property to show that the reachable states under the 
different semantics are equivalent. 

Lemma 2. If P = loop{li| . . . |ln} is a read-write-once pro- 
gram, then Expand(Reach''(P)) = Reach(P). 

Proof. To prove that Reach (P) C Expand(Reach''(P)), we show 

Vm : Reach(P,m) C ExpandCReach" (P)) 

by straightforward induction on m, noting that so £ Expand(so), 
and Post(Expand(s''),l) C Post''(s'', I) for any Relational-state 
s" G E", instruction I, and any program P (not necessarily read- 
write-once). 



To show inclusion in the other direction, we prove 

Vm : Expand(Reach''(P,m)) C Reach(P) 

by induction on m. For the base case, 

Expand(Reach"(P,0)) = Reach(P,0) C Reach(P) 

by the definition of the initial states. By induction, assume that 

Expand(Reach''(P,m)) C Reach(P) 

Let s' G Expand(Reach''(P, m + 1)). By Lemma [T] either s' is 
already in Reach' (P, m), in which case the inductive hypothesis 
applies and hence s' G Reach(P), or 

s' e Post(Expand(Reach''(P, m), Ij) 

for some j. That is, there is a s £ Expand(Reach'' (P, m) such that 
s' £ Post (s, I j). From the induction hypothesis s G Reach(P). As 
Reach (P) is closed under Post, we conclude s' G Reach (P). □ 

5.3 Cloning 

At this point, we have shown that the Imperative semantics of read- 
write-once programs are equivalent to the Relational semantics. All 
that remains is to show that the translation procedure of Figure [7] 
produces read-write-once programs. Unfortunately, this is not true. 

Example. Consider the following constraints: 

0h {k} , h {true} <: {k} , X : Hi;y: K y- {true} <: {x = y} 

It is easy to check that on the above constraints, the translation 
procedure yields the IMP program from the previous example, 
which is not read-write-once. 

The reason the translated program is not a read-write-once pro- 
gram is that there can be constraints G h Ti <: T2 in which k 
occurs in multiple places within G and Ti . 

To solve this problem, we can simply clone the n variables that 
occur multiple times inside a constraint, and use different clones at 
each occurrence! We formalize this as a procedure Clone that maps 
a finite set of constraints to another finite set. The procedure works 
as follows. For each k that is read upto n times in some constraint, 
we make n clones, k.^ , . . . ,k", and 



I. 



for the i^^ occurence of k within any constraint, we use the i"* 
clone (instead of k), and, 
2. for each constraint where k appears on the right hand side, 
we make n clones of the constraints where in the i*'* cloned 
constraint, we use (instead of k). 

The first step ensures that each k is read-once in any constraint, 
and the second step ensures that the clones correspond to exactly 
the same set of tuples as the original variable k. We can prove that 
Clone enjoys the following properties. 

Theorem 3. Let C be a finite set of constraints. 

1. [Clone(C)] is a read-write-once program. 

2. Clone(C) is satisfiable iff C is sati.sfiable. 

It is easy to verify that |Clone(C)] is a read-write-once pro- 
gram. Furthermore, any satisfying solution for the original con- 
straints can be mapped directly to a solution for the cloned con- 
straints. To go in the other direction, we must map a solution that 
satisfies the cloned constraints to one that satisfies the original con- 
straints. This is trivial if the solution for the cloned constraints 
maps each clone to the same set of tuples. We show that if the 
cloned constraints have a satisfying solution, they have a solution 
that satisfies the above property. To this end, we prove the follow- 
ing lemma that states that for any set of constraints, the satisfying 
solutions are closed under intersection. 



Program 


Time 

(sec) 


Invariant 


Refinement Types 


max 


0.091 


Kl.l < Kl.O A Kl.2 < Kl.O 


Kx = true, Ky = true, ki= x < v A y < v 


sum 


0.071 


< K2-0 A K2-1 < K2-0 


Kfc = true, K2 = < V A k < V 


foldn 


0.060 


< Ki.O AO < K3.O A K3.O < res. 2 


Ki = < V, K-j = < V A V < n 


arraymax 


0.135 


< K4-0 A < K5.O A 
< K(j.O A Kg.O < leii(Kg.l) 


K4O = < V, Ks = < V, 

Kfi = < V, Kg = V < len(a) 


mask 


0.098 


Ki.O < len(Ki.4) A Ki.l < ki.O A 
< K2-0 A K2.0 < leii(K2.3) 


Kiv < len(xs) Ai < V, 
K2 = < V A V < len(a) 


samples 


0.117 


< K2.0 A K2.0 < leii(K2-4) A 
< K3-0 A K3-0 < len(K3.3) A < Ke.O 


K2 = < V Av < len(b), 
K-j = < V A V < len(a), Kg = < v 



Table 1. Experimental evaluation using a predicate abstraction- 
based verification tool on examples from |29|. The third column 
presents the invariant for the translated program, and the resulting 
refinement types. 



Lemma 3. If Si and S2 are solutions that satisfy C then Sif\S2 = 
Xn.Si[n) n 52 (k) satisfies C. 

Thus if S satisfies the cloned constraints then by symmetry and 
Lemma[3]the solution that maps each cloned variable to n"=iS'(K') 
also satisfies the cloned constraints, and hence, directly yields a 
solution to the original constraints. 

Finally, as a corollary of Theorems ! 1 1213! we get our main result 
that reduces the question of refinement type constraint satisfaction, 
to that of safety verification. 

Theorem 4. C is satisfiable |Clone(C)] is Imperative-safe. 

While we state Theorems[T]and[3]as preserving satisfiability, the 
proof shows how the solutions can be effectively mapped between 
C and [C] (or [Clone(C)]. In particular, while the intersection 
of two non-trivial solutions can be a trivial solution, it would be 
guaranteed that in that case, the trivial solution satisfies C. Stated in 
terms of invariants. Lemma [3] states the observation that that there 
may be several non-comparable inductive invariants to prove a 
safety property, but in that case, the intersection of all the inductive 
invariants is also an inductive invariant. 



6. Experiments 

We have implemented a verification tool for OCAML programs 
based on RTI. We use the liquid types infrastructure implemented 
in Dsolve 1 29 1 to generate refinement type constraints from 
OCAML programs. We use ARMC |28|, a software model checker 
using predicate abstraction and interpolation-based refinement, as 
the verifier for the translated imperative program. 

Table[T]shows the results of running our tool on a suite of small 
OCAML examples from |29|. For array manipulating programs, the 
safety objective is to prove array accesses are within bounds. For 
MAX we prove that the output is larger than input values. For SUM 
we prove that the sum is larger than the largest summation term. 

Table[2]presents the running time of our tool on the benchmark 
programs for the Depcegar verifier [31]. We observe that despite of 
our blackbox treatment of ARMC as a constraint solver we obtain 
competitive running times compared to Depcegar on most of the 
examples (Depcegar uses a customized procedure for unfolding 
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lime 


# iterations 


^ predicates 


boolfiip.ml 


T 1 Tc 
Z.l /S 


7 


21 


sum. ml 


0.24s 


5 


14 


sum-acm.ml 


0.11s 


1 


3 


sum-all. ml 


3.51s 


10 


26 


mult. ml 


4.67s 


10 


25 


mult-cps.ml 


780.24s 


11 


27 


mult-all. ml 


18.44s 


9 


24 


boolflip-e.ml 


0.65s 




sum-e.ml 


0.01s 


sum-acm-e.ml 


0.02s 


sum-all-e.ml 


0.79s 


mult-e.ml 


0.01s 


mult-cps-e.ml 


7.69s 


mult-all-e.ml 


144.93s 



Table 2. Experimental evaluation of our tool on Depcegar bench- 
marks [31 1. The third column presents the number of abstraction 
refinment iterations required by ARMC. The last column gives the 
number of predicates discovered by ARMC. For the programs with 
suffix "-e", which are incorrect, we omit the number of iterations 
and predicates and only show the time required by ARMC to find 
a counterexample. 



constraints and creating interpolation queries that yield refinement 
types). 

Most of the predicates discovered by the interpolation-based 
abstraction refinement procedure implemented in ARMC fall into 
the fragment "two variables per inequality." The example MASK 
required a predicate that refers to three variables, see ki . While our 
initial experiments used a CEGAR-based tool, we expect optimized 
abstract interpreters for numerical domains to also work well for 
this class of properties. 

7. Extensions and Related Work 
7.1 Completeness 

The soundness of safety verification for higher-order programs for 
any domain follows from the soundness of constraint generation 
(e.g., Theorem 1 in |29|) and Theorem |4l Since the safety verifi- 
cation problem for higher-order programs is undecidable, the tech- 
nique cannot be complete in general. Even in the finite-state case, 
in which each base type has a finite domain (e.g., booleans), com- 
pleteness depends on the generation of type constraints. For exam- 
ple, in our examples and in our implementation, we have assumed a 
context insensitive constraint generation from program syntax, i.e., 
we have not distinguished the types of the same function at differ- 
ent call points. This entails a loss of information, as the following 
example demonstrates. Consider 

let check f x y = assert (f x = y) in 
check (fun a -> a) false false ; 
check (fun a -> not a) false true 

where the builtin function assert has the type {i^:bool | v} — >■ 
unit. The refinement template for check generated by our con- 
straint generation process is 

(x : {v.hoo\ I Ki} — !> {1^2}) {i^z} {'^4} ^ unit 

which is too weak to show that the program is safe. This is because 
the template "merges" the two call sites for check. 

One way to get context sensitivity is through intersection types 
I12I I14', 20 , 251. For the above example, we can show type safety 
using the following refined type for check: 

. (x : bool {i^ — x}) {^t^} {^^} uni"t 
(x : bool {u — ^x}) — > {^i^} — > {i^} unit 



It is important to note that Theorems [T] and |2] hold for any set 
of constraints. Thus, one way to get completeness in the finite 
state case is to generate refinement templates using intersection 
types, perform the translation to IMP programs, and then using a 
complete invariant generation technique for finite state systems. 
The key observation (made in |20|) that ensures a finite number 
of constraints, is that there is at most a finite number of "contexts" 
in the finte state case, and hence a finite number of terms in the 
intersection types. The bad news is that the bound on the number 
of contexts is exp„(fe), where n is the highest order of any function 
in the program, k is the maximum arity of any function in the 
program, and exp^(fc) is a stack of n exponentials, defined by 
exp(,(A:) = k, andexp„+i(fc) = 2"<'='.('='. 

Fully context-sensitive constraints are used in 1201 to show com- 
pleteness in the finite case, at the price of exp„ (k) in every case, 
not just the worst case. In our exposition and our implementation, 
we have traded off precision for scalability: while we lose pre- 
cision by generating context-insensitive constraints, we avoid the 
exp„ blow-up that comes with full context sensitivity. However, it 
has been shown through practical benchmarks that since the types 
themselves capture relations between the inputs and outputs, the 
context-insensitive constraint generation suffices to prove a variety 
of complex programs safe |3] |18||291 . 

When considering completeness properties in special cases, we 
point out completeness wrt. the discovery of refinement predicates 
in octagons/difference bounds abstract domains |24| and template- 
based invariant generation for linear arithmetic 1 7 1 and extensions 
with uninterpreted function symbols |5 |, which carries over from 
respective verification approaches. 

7.2 Related Work 

Higher-Order Programs. Kobayashi f20"2Tl gives an algorithm 
for model checking arbitrary /i-calculus properties of finite-data 
programs with higher order functions by a reduction to model 
checking for higher-order recursion schemes (HORS) |26|. For 
safety verification, RTI shows a promising alternative. 

First, the reduction to HORS critically depends on a finite-state 
abstraction of the data. In contrast, our reduction defers the data ab- 
straction to the abstract interpreter working on the imperative pro- 
gram, thus enabling the direct application of abstract interpreters 
working over infinite domains. Since abstract interpreters over infi- 
nite abstract domains are strictly more powerful than (infinite fam- 
ilies of) finite ones | 8|, our approach can be strictly more powerful 
for infinite-state programs. 

Second, in the translation of an abstracted program to a HORS, 
this algorithm eliminates Boolean variables by enumerating all 
possible assignments to them, giving an exponential blow-up from 
the program to the HORS. In contrast, our technique preserves the 
Boolean state symbolically, enabling the use of efficient symbolic 
algorithms for verification. For example, for the simple example: 

let f bl ... bn X = 

if (bl I I . . . I I bn) then lock x; 

if (bl II ... II bn) then unlock x 
in let f (*) ... (*) (newlock ()) 

where we wish to prove that lock and unlock alternate. Kobayashi's 
translation f201 gives an exponential sized HORS, with a version of 
f for each assignment to bl , . . . , bn. In contrast, our reduction pre- 
serves the source-level expressions and is linear, and amenable to 
symbolic verification techniques (e.g., BDDs). Previous experience 
with software model checking l2l ll6|[T7l shows that the number of 
reachable states is often drastically smaller than 2^ where p is the 
number of Booleans. Thus, the pre-processing step that enumerates 
Booleans may not lead to a scalable implementation. 



Might [231 describes logic-flow analysis, a general safety verifi- 
cation algorithm for higher-order languages, which is the product of 
a fc-CFA like call-strings analysis and a form of SMT-based pred- 
icate abstraction (together with widening). In contrast, our work 
shows how higher-order languages can be analyzed directly via ab- 
stract analyses designed for first-order imperative languages. 

Inference of refinement types using conterexample-guided tech- 
niques was recentrly identified as a promising direction |31 32 1. In 
contrast, our approach is not limited to CEGAR and facilitates the 
applicability of a wide range abstract interpretation techniques for 
precise reasoning about program data. 

Software Verification. This work was motivated by the recent suc- 
cess in software model checking for first-order imperative pro- 
grams 1 2 6 16 22 1, and the desire to apply similar techniques to 
modern programming languages with higher order functions. Our 
starting point was refinement types 1 14 19 1, implemented in de- 
pendent ML | 33 1 to give strong static guarantees, and the work on 
liquid types 1 18 29 1 that applied predicate abstraction to infer re- 
finement types. By enabling the application of automatic invariant 
generation from software model checking, RTI reduces the need 
for programmer annotations in refinement type systems. 
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